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Abstract - High-performance pipelined multiplier is described. Its high perfor- 
mance is resulted from the fast carry-save adder basic cell which has a simple 
structure is suitable for the Gate Forest semicustom environment. The carry- 
save adder computes the sum and carry within two gate delay. Results show 
that the proposed adder can operate at 200 MHz for a 2-micron CMOS process 
and expected to have better performance in a Gate Forest realization. 

1 Introduction 

Multiplication is one of the most common and important operation in signal processing [1]. 
But, the operation is time consuming. Many efforts have been invested in improving the 
multiplication speed. Pipelined multipliers with different architecture were proposed to 
enhance the performance [2-5]. However, most of the works has been devoted to improve 
the performance in system level. Little work has been done to enhance the system perfor- 
mance in reducing the propagation delay of the basic computation cell in circuit level. It is 
well known that the performance of a digital system not only depends on its architecture, 
but also on its implementation complexity. To enhance the performance, it is necessary to 
reduce the implementation complexity as much as possible. 

For pipelined multiplier, the essential component is the carry-save adder. The maxi- 
mum clock speed of the multiplier is determined by the delay time of the basic carry-save 
adder cell to form and add the partial product, and generate the carry. A carry-save 
adder with simple implementation complexity will shorten these operation time and en- 
hance the maximum throughput rate of the multiplier directly. To achieve this goal, a high 
performance pipelined multiplier with fast carry-save adder cell is proposed. 

2 System Description 

The pipelined multiplier is constructed in as a semi-systolic array [3]. Schematic of the 
pipelined multiplier array is shown in figure 1. It has three basic components, the carry-save 
adder, half adder and register. They are organized as three basic arrays of the pipelined 
multiplier. Two phase clocking scheme is used to control the data flow. It is well known 


8.4.2 


that the throughput rate of a pipelined multiplier is determined by the delay of its basic 
cells, in this case, the carry-save adder. To enhance the system performance, it is necessary 
to reduce the delay of the adder. It can be done either in device technology or reducing 
the circuit complexity. The first approach is expensive and can not be controlled by the 
circuit designer. Instead, the second approach is used in this paper. 

A fast carry-save adder is designed with simple circuit structure; and, it is suitable 
for the Gate Forest semicustom environment [6]. Unlike traditional implementations, the 
adder cell does not use multiple-level of nand gate or complex logic gate to evaluate the 
sum and carry. 

3 Circuit Implementation 

Figure 2 shows the circuit schematic of the proposed adder cell. It consists of three 
functional units: sum evaluation unit, carry evaluation unit and partial product generation 
unit. The sum is computed by an exclusive-or and equivalence dual pair. Each of the gates 
has only five transistors constituting one level of logic. Taking the advantage of charge 
re-distribution, this arrangement has the best performance than using two exclusive-or 
gate for sum evaluation. At the precharge phase, Phil is asserted. The exclusive-or gate is 
isolated from the equivalence gate since the inputs of the equivalence gate are pull to ground 
by the Nfets at node 1 and node 3. At the same time, node 2 is precharged through the Pfet 
controlled by Phi2 which is not asserted. When Phi2 is asserted, charges at node 2 may 
be re-distributed to node 1 depending on the inputs signal D-bar of the equivalence gate. 
This charge re-distribution phenomena* can speed up the sum evaluation. By carefully 
design, charges stored at node 2 can be equally sharing with node 1 if the capacitances 
at node 1 and node are made to be equal. For inputs AB as ‘01’ or ‘10’ together with 
D-bar as *1’, the logic ‘1’ signal from either A or B has to propagate to node 2 through the 
exclusive-or gate and equivalence gate; so that, node 1 and node 2 will be charged to logic 
‘1’. The time needed for both node 1 and 2 to be charged to logic ‘1’ would be reduced 
significantly. Since node 1 and node 2 are sharing equal charges (at voltage Vdd/2 ideally) 
when D-bar is at ‘1’. Thus, the worst case propagation delay of sum evaluation will be 
reduced. Such delay reduction is also true for the case when logic ‘0’ has to propagate 
from input signals A or B to the output node of the equivalence gate. 

For carry computation, a Manchester carry chain is used to compute the carry. This 
technique has shown to be very effective for carry propagation in parallel adder. Because 
of the simple structure, the carry can be computed in two gate delay at most. The partial 
product term is formed by an And gate embodied in the carry-save adder to eliminate the 
extra propagation delay needed for the partial product term which is usually generated in 
the carry-save adder before sum and carry evaluations. In other words, the partial product 
term generated here is to be used by the next carry saved adder cell. This product term 
generation scheme has been proved to be very effective [7] in improving the multiplier 
throughput rate as the critical path delay reduced to the delay of the full adder only. 

The half adder is implemented with the exclusive-or gate and And gate. It has the 
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Figure 2: Circuit schematic of the carry-save adder. The basic cell percharges at PHI 1, 
evaluate sum, carry and partial product 
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simple structure as the carry-save adder. The register simply consists of inverters for data 
storage and pass-transistor for data flow control. 

4 Circuit Performance 

The multiplier is designed and verified. The carry-save adder was simulated with MOSIS 
2-micron N-well CMOS device parameters. Results showed that the proposed adder can 
operate upto 200 MHz (figure 3). Better performance is expected in Gate Forest realization 
as 0.45 nsec of internal gate delay of ring oscillators was documented [8]. 

5 Summary 

A high performance pipelined multiplier has been described. The performance of the 
multiplier is enhanced by a novel design of carry-save adder which has a simple circuit 
structure is suitable for the Gate Forest semicustom environment. Sum and carry can 
be evaluated at most in two gate delay. With the simple circuit structure and advantage 
of charge re-distribution, the multiplier can operate upto 200 MHz when simulated with 
MOSIS 2-micron N-well CMOS device parameters. Better performance is expected when 
it is implemented in Gate Forest environment. 
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Figure 3: Simulation Results of carry save adder 
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